Designing caption production rules based on face, text, and motion detection

نویسندگان

Claude Chapdelaine

Mario Beaulieu

Langis Gagnon

چکیده

Producing off-line captions for the deaf and hearing impaired people is a labor-intensive task that can require up to 18 hours of production per hour of film. Captions are placed manually close to the region of interest but it must avoid masking human faces, texts or any moving objects that might be relevant to the story flow. Our goal is to use image processing techniques to reduce the off-line caption production process by automatically placing the captions on the proper consecutive frames. We implemented a computer-assisted captioning software tool which integrates detection of faces, texts and visual motion regions. The near frontal faces are detected using a cascade of weak classifier and tracked through a particle filter. Then, frames are scanned to perform text spotting and build a region map suitable for text recognition. Finally, motion mapping is based on the Lukas-Kanade optical flow algorithm and provides MPEG-7 motion descriptors. The combined detected items are then fed to a rule-based algorithm to determine the best captions localization for the related sequences of frames. This paper focuses on the defined rules to assist the human captioners and the results of a user evaluation for this approach.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Designing Caption Production Rules Based on Face, Text and Motion Detections

متن کامل

Visualizing Multimedia Content on Paper Documents: Components of Key Frame Selection for Video Paper

The components of a key frame selection algorithm for a paper-based multimedia browsing interface called Video Paper are described. Analysis of video image frames is combined with the results of processing the closed caption to select key frames that are printed on a paper document together with the closed caption. Bar codes positioned near the key frames allow a user to play the video from the...

متن کامل

PICTION: A ystem that Uses Captions to Human Faces in Newspaper Photographs*

It is often the case that linguistic and pictorial information are jointly provided to communicate information. In situations where the text describes salient aspects of the picture, it is possible to use the text to direct the interpretation (i.e., labelling objects) in the accompanying picture. This paper focuses on the implementation of a multi-stage system PICTION that uses captions to iden...

متن کامل

Inference and retrieval of soccer event

As to the soccer video, the event is defined as the medium-level spatiotemporal entity interesting to users, having certain context cues corresponding to the specific domain knowledge model. As a medium-level entity, the inference of soccer event is based on the fusion of context cues and domain knowledge model. The shooting event is chosen as research target and the event analysis method is ex...

متن کامل

Improving video captioning for deaf and hearing-impaired people based on eye movement and attention overload

Deaf and hearing-impaired people capture information in video through visual content and captions. Those activities require different visual attention strategies and up to now, little is known on how caption readers balance these two visual attention demands. Understanding these strategies could suggest more efficient ways of producing captions. Eye tracking and attention overload detections ar...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2008

Designing caption production rules based on face, text, and motion detection

نویسندگان

چکیده

منابع مشابه

Designing Caption Production Rules Based on Face, Text and Motion Detections

Visualizing Multimedia Content on Paper Documents: Components of Key Frame Selection for Video Paper

PICTION: A ystem that Uses Captions to Human Faces in Newspaper Photographs*

Inference and retrieval of soccer event

Improving video captioning for deaf and hearing-impaired people based on eye movement and attention overload

عنوان ژورنال:

اشتراک گذاری